Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF #21083

Closed

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Sep 19, 2024

Hi,
Can you help to review this patch?
Thanks!

This patch is based on #20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api.

On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions.

Test

test/jdk/jdk/incubator/vector

Performance

data on bananapi

Benchmark - bananapi (size) Mode Cnt Score +intrinsic Error +intrinsic Score -intrinsic Error -intrinsic Units Improvement
Double128Vector.ACOS 1024 avgt 10 112444.388 655.761 208554.742 1508.709 ns/op 1.855
Double128Vector.ASIN 1024 avgt 10 104121.259 243.167 208314.499 2833.61 ns/op 2.001
Double128Vector.ATAN 1024 avgt 10 136941.263 243.486 284024.53 2204.224 ns/op 2.074
Double128Vector.ATAN2 1024 avgt 10 163228.681 435.455 427589.587 3045.192 ns/op 2.62
Double128Vector.CBRT 1024 avgt 10 146395.753 239.355 317136.654 1330.869 ns/op 2.166
Double128Vector.COS 1024 avgt 10 154865.298 235.697 305721.518 1319.313 ns/op 1.974
Double128Vector.COSH 1024 avgt 10 189212.943 262.399 220756.27 61324.863 ns/op 1.167
Double128Vector.EXP 1024 avgt 10 113941.594 219.647 252853.07 891.272 ns/op 2.219
Double128Vector.EXPM1 1024 avgt 10 184552.939 513.715 254087.184 2144.997 ns/op 1.377
Double128Vector.HYPOT 1024 avgt 10 111580.194 423.282 374537.338 2091.811 ns/op 3.357
Double128Vector.LOG 1024 avgt 10 110680.548 192.731 265391.129 2653.519 ns/op 2.398
Double128Vector.LOG10 1024 avgt 10 116708.105 167.095 285764.405 2489.08 ns/op 2.449
Double128Vector.LOG1P 1024 avgt 10 115633.302 567.7 317235.967 1062.848 ns/op 2.743
Double128Vector.POW 1024 avgt 10 321655.14 36.55 560765.066 2669.33 ns/op 1.743
Double128Vector.SIN 1024 avgt 10 166240.988 512.253 287741.373 2089.286 ns/op 1.731
Double128Vector.SINH 1024 avgt 10 196233.614 225.88 221493.573 60941.438 ns/op 1.129
Double128Vector.TAN 1024 avgt 10 203347.384 267.385 372912.183 2093.675 ns/op 1.834
Double128Vector.TANH 1024 avgt 10 195587.19 5260.844 190723.4 873.135 ns/op 0.975
Double256Vector.ACOS 1024 avgt 10 55282.885 8.888 138468.959 1342.937 ns/op 2.505
Double256Vector.ASIN 1024 avgt 10 51424.997 22.614 141245.24 3213.405 ns/op 2.747
Double256Vector.ATAN 1024 avgt 10 70385.397 14.196 210226.648 897.412 ns/op 2.987
Double256Vector.ATAN2 1024 avgt 10 83098.264 120.424 373363.523 3093.761 ns/op 4.493
Double256Vector.CBRT 1024 avgt 10 72695.917 28.785 250843.027 869.34 ns/op 3.451
Double256Vector.COS 1024 avgt 10 77373.4 10.275 249779.557 1143.93 ns/op 3.228
Double256Vector.COSH 1024 avgt 10 95626.561 169.093 135295.836 26164.804 ns/op 1.415
Double256Vector.EXP 1024 avgt 10 57013.105 25.681 169211.888 1723.985 ns/op 2.968
Double256Vector.EXPM1 1024 avgt 10 89929.364 172.868 189713.959 619.662 ns/op 2.11
Double256Vector.HYPOT 1024 avgt 10 58179.576 72.265 253002.315 1413.97 ns/op 4.349
Double256Vector.LOG 1024 avgt 10 55274.107 6.781 199552.499 1070.838 ns/op 3.61
Double256Vector.LOG10 1024 avgt 10 58321.206 3.046 219497.134 1784.676 ns/op 3.764
Double256Vector.LOG1P 1024 avgt 10 59457.661 4.266 248897.335 1101.141 ns/op 4.186
Double256Vector.POW 1024 avgt 10 161727.792 283.278 389901.211 5128.643 ns/op 2.411
Double256Vector.SIN 1024 avgt 10 82028.764 163.402 229585.318 2284.46 ns/op 2.799
Double256Vector.SINH 1024 avgt 10 95533.939 144.219 138338.01 32257.269 ns/op 1.448
Double256Vector.TAN 1024 avgt 10 100587.595 175.454 255335.96 2392.867 ns/op 2.538
Double256Vector.TANH 1024 avgt 10 122826.824 8132.31 116587.352 20456.614 ns/op 0.949
Double512Vector.ACOS 1024 avgt 10 100644.726 6559.453 90596.17 6201.774 ns/op 0.9
Double512Vector.ASIN 1024 avgt 10 97781.73 6454.561 81923.501 6875.259 ns/op 0.838
Double512Vector.ATAN 1024 avgt 10 230365.297 5657.262 231136.108 8201.677 ns/op 1.003
Double512Vector.ATAN2 1024 avgt 10 330644.739 965.308 334507.514 1871.147 ns/op 1.012
Double512Vector.CBRT 1024 avgt 10 269499.416 7578.3 275058.533 2931.999 ns/op 1.021
Double512Vector.COS 1024 avgt 10 250239.661 8717.098 251643.64 5974.845 ns/op 1.006
Double512Vector.COSH 1024 avgt 10 130896.571 3149.555 116148.85 19419.192 ns/op 0.887
Double512Vector.EXP 1024 avgt 10 167358.383 4017.309 163777.077 9441.332 ns/op 0.979
Double512Vector.EXPM1 1024 avgt 10 180627.099 6239.875 181451.788 2833.293 ns/op 1.005
Double512Vector.HYPOT 1024 avgt 10 259838.022 2413.253 253563.622 5666.461 ns/op 0.976
Double512Vector.LOG 1024 avgt 10 214492.394 8551.06 223659.532 4634.475 ns/op 1.043
Double512Vector.LOG10 1024 avgt 10 237482.746 5504.954 241056.068 3773.962 ns/op 1.015
Double512Vector.LOG1P 1024 avgt 10 259562.363 6983.428 255542.226 6799.872 ns/op 0.985
Double512Vector.POW 1024 avgt 10 409067.718 1031.843 415598.626 1333.94 ns/op 1.016
Double512Vector.SIN 1024 avgt 10 233720.922 9117.177 237166.138 5740.104 ns/op 1.015
Double512Vector.SINH 1024 avgt 10 106110.446 8082.622 120441.14 19807.6 ns/op 1.135
Double512Vector.TAN 1024 avgt 10 286363.576 5171.85 289463.344 5786.435 ns/op 1.011
Double512Vector.TANH 1024 avgt 10 55621.25 1751.435 54999.583 114.941 ns/op 0.989
Double64Vector.ACOS 1024 avgt 10 440775.699 2196.779 448951.428 1820.252 ns/op 1.019
Double64Vector.ASIN 1024 avgt 10 463051.606 2394.98 454351.539 2086.492 ns/op 0.981
Double64Vector.ATAN 1024 avgt 10 544190.664 4013.885 546309.02 3440.376 ns/op 1.004
Double64Vector.ATAN2 1024 avgt 10 799967.835 3488.851 812483.613 2421.999 ns/op 1.016
Double64Vector.CBRT 1024 avgt 10 618953.967 5293.167 622328.702 2301.048 ns/op 1.005
Double64Vector.COS 1024 avgt 10 574667.991 2894.881 604963.23 12128.549 ns/op 1.053
Double64Vector.COSH 1024 avgt 10 480884.659 3050.01 474405.728 2223.766 ns/op 0.987
Double64Vector.EXP 1024 avgt 10 476743.952 1468.7 493014.212 2879.845 ns/op 1.034
Double64Vector.EXPM1 1024 avgt 10 522048.987 2879.475 505978.67 1825.956 ns/op 0.969
Double64Vector.HYPOT 1024 avgt 10 713841.457 2816.621 716284.872 7024.984 ns/op 1.003
Double64Vector.LOG 1024 avgt 10 523702.517 1849.651 525498.61 1122.938 ns/op 1.003
Double64Vector.LOG10 1024 avgt 10 539968.004 2445.033 541415.051 2966.057 ns/op 1.003
Double64Vector.LOG1P 1024 avgt 10 556206.02 3156.961 554613.942 2628.038 ns/op 0.997
Double64Vector.POW 1024 avgt 10 931275.694 5378.585 914787.042 11244.374 ns/op 0.982
Double64Vector.SIN 1024 avgt 10 620118.172 3805.705 553147.004 2265.843 ns/op 0.892
Double64Vector.SINH 1024 avgt 10 504218.91 2259.924 482680.497 5218.21 ns/op 0.957
Double64Vector.TAN 1024 avgt 10 620591.643 5541.53 622098.336 4892.394 ns/op 1.002
Double64Vector.TANH 1024 avgt 10 438766.135 4313.069 426783.749 5986.632 ns/op 0.973
DoubleMaxVector.ACOS 1024 avgt 10 55281.88 5.819 152707.139 2337.434 ns/op 2.762
DoubleMaxVector.ASIN 1024 avgt 10 51632.365 20.723 152958.169 2530.258 ns/op 2.962
DoubleMaxVector.ATAN 1024 avgt 10 70393.309 7.502 225146.6 4836.393 ns/op 3.198
DoubleMaxVector.ATAN2 1024 avgt 10 83049.389 131.221 376129.104 2973.54 ns/op 4.529
DoubleMaxVector.CBRT 1024 avgt 10 73401.993 20.547 252789.351 1322.396 ns/op 3.444
DoubleMaxVector.COS 1024 avgt 10 77388.046 8.768 252428.563 4605.328 ns/op 3.262
DoubleMaxVector.COSH 1024 avgt 10 95373.866 167.177 145355.624 35146.538 ns/op 1.524
DoubleMaxVector.EXP 1024 avgt 10 57910.881 11.031 183133.879 3721.502 ns/op 3.162
DoubleMaxVector.EXPM1 1024 avgt 10 89968.248 180.822 199712.477 2009.862 ns/op 2.22
DoubleMaxVector.HYPOT 1024 avgt 10 59064.115 186.157 253275.967 1479.124 ns/op 4.288
DoubleMaxVector.LOG 1024 avgt 10 53685.913 4.08 202019.279 1174.832 ns/op 3.763
DoubleMaxVector.LOG10 1024 avgt 10 58333.057 4.644 223237.023 2682.561 ns/op 3.827
DoubleMaxVector.LOG1P 1024 avgt 10 59455.511 4.493 248216.075 4200.623 ns/op 4.175
DoubleMaxVector.POW 1024 avgt 10 161793.312 355.543 395000.995 4070.581 ns/op 2.441
DoubleMaxVector.SIN 1024 avgt 10 82045.108 178.173 232964.02 3351.878 ns/op 2.839
DoubleMaxVector.SINH 1024 avgt 10 95557.571 171.167 139434.904 33020.695 ns/op 1.459
DoubleMaxVector.TAN 1024 avgt 10 99139.084 170.106 255665.125 1463.226 ns/op 2.579
DoubleMaxVector.TANH 1024 avgt 10 122556.944 7304.643 112638.697 22789.428 ns/op 0.919
DoubleScalar.ACOS 1024 avgt 10 35364.49 43.834 35391.461 11.475 ns/op 1.001
DoubleScalar.ASIN 1024 avgt 10 36020.676 41.123 36040.44 22.284 ns/op 1.001
DoubleScalar.ATAN 1024 avgt 10 100104.331 135.729 102039.921 286.803 ns/op 1.019
DoubleScalar.ATAN2 1024 avgt 10 163987.639 239.624 165832.456 1865.186 ns/op 1.011
DoubleScalar.CBRT 1024 avgt 10 144175.051 169.152 144177.588 175.837 ns/op 1
DoubleScalar.COS 1024 avgt 10 129137.254 186.072 129187.403 164.344 ns/op 1
DoubleScalar.COSH 1024 avgt 10 65408.411 158.758 65469.654 302.387 ns/op 1.001
DoubleScalar.EXP 1024 avgt 10 66358.519 15.942 66370.088 13.886 ns/op 1
DoubleScalar.EXPM1 1024 avgt 10 84449.659 20.205 84443.216 17.539 ns/op 1
DoubleScalar.HYPOT 1024 avgt 10 98996.854 149.906 99114.226 247.392 ns/op 1.001
DoubleScalar.LOG 1024 avgt 10 92296.061 84.554 92362.323 127.4 ns/op 1.001
DoubleScalar.LOG10 1024 avgt 10 108959.603 214.845 109177.708 151.172 ns/op 1.002
DoubleScalar.LOG1P 1024 avgt 10 133745.827 189.726 133626.747 159.786 ns/op 0.999
DoubleScalar.POW 1024 avgt 10 245735.03 392.669 246363.909 776.007 ns/op 1.003
DoubleScalar.SIN 1024 avgt 10 112985.666 211.564 113015.922 93.048 ns/op 1
DoubleScalar.SINH 1024 avgt 10 65009.526 547.157 65443.714 150.434 ns/op 1.007
DoubleScalar.TAN 1024 avgt 10 163437.236 157.673 163196.802 196.316 ns/op 0.999
DoubleScalar.TANH 1024 avgt 10 15174.949 7.999 15178.266 19.863 ns/op 1
Float128Vector.ACOS 1024 avgt 10 43372.933 5.055 126575.586 976.159 ns/op 2.918
Float128Vector.ASIN 1024 avgt 10 38632.619 1.743 127126.175 1368.112 ns/op 3.291
Float128Vector.ATAN 1024 avgt 10 56269.042 3.274 188537.782 1465.567 ns/op 3.351
Float128Vector.ATAN2 1024 avgt 10 64863.602 9.184 289789.784 1933.189 ns/op 4.468
Float128Vector.CBRT 1024 avgt 10 60648.572 30.499 219496.505 2628.005 ns/op 3.619
Float128Vector.COS 1024 avgt 10 90296.6 173.89 193875.308 2878.795 ns/op 2.147
Float128Vector.COSH 1024 avgt 10 72513.407 13.428 134362.41 28085.258 ns/op 1.853
Float128Vector.EXP 1024 avgt 10 32520.847 6.845 158283.434 1092.762 ns/op 4.867
Float128Vector.EXPM1 1024 avgt 10 65130.005 3.498 186841.627 1140.313 ns/op 2.869
Float128Vector.HYPOT 1024 avgt 10 52240.243 4.423 228928.31 1385.126 ns/op 4.382
Float128Vector.LOG 1024 avgt 10 44080.307 2.549 186830.712 797.576 ns/op 4.238
Float128Vector.LOG10 1024 avgt 10 45302.969 7.095 189605.302 2126.429 ns/op 4.185
Float128Vector.LOG1P 1024 avgt 10 47599.582 3.822 194058.394 2620.35 ns/op 4.077
Float128Vector.POW 1024 avgt 10 118329.731 157.834 375914.2 2800.253 ns/op 3.177
Float128Vector.SIN 1024 avgt 10 96545.285 409.639 190830.529 1511.452 ns/op 1.977
Float128Vector.SINH 1024 avgt 10 67999.296 8.793 134817.031 28316.519 ns/op 1.983
Float128Vector.TAN 1024 avgt 10 105051.902 193.021 236690.576 6686.38 ns/op 2.253
Float128Vector.TANH 1024 avgt 10 107938.486 1593.331 107867.708 1037.358 ns/op 0.999
Float256Vector.ACOS 1024 avgt 10 21993.336 0.945 90171.186 765.896 ns/op 4.1
Float256Vector.ASIN 1024 avgt 10 19176.439 4.288 91491.757 946.887 ns/op 4.771
Float256Vector.ATAN 1024 avgt 10 28573.58 1.788 153126.232 1354.054 ns/op 5.359
Float256Vector.ATAN2 1024 avgt 10 32809.207 57.366 241229.586 2703.039 ns/op 7.352
Float256Vector.CBRT 1024 avgt 10 30349.65 5.52 195162.623 2631.134 ns/op 6.43
Float256Vector.COS 1024 avgt 10 45629.146 6.614 185366.17 1616.6 ns/op 4.062
Float256Vector.COSH 1024 avgt 10 36923.595 2.135 108690.335 13921.018 ns/op 2.944
Float256Vector.EXP 1024 avgt 10 16170.263 2.046 125594.096 1033.554 ns/op 7.767
Float256Vector.EXPM1 1024 avgt 10 32608.2 5.484 129709.448 993.492 ns/op 3.978
Float256Vector.HYPOT 1024 avgt 10 27921.801 1.528 190117.16 1454.543 ns/op 6.809
Float256Vector.LOG 1024 avgt 10 22076.681 2.329 134540.724 1704.931 ns/op 6.094
Float256Vector.LOG10 1024 avgt 10 23064.284 2.37 159962.122 2503.179 ns/op 6.935
Float256Vector.LOG1P 1024 avgt 10 23835.965 2.04 194624.332 4779.995 ns/op 8.165
Float256Vector.POW 1024 avgt 10 59593.468 74.705 317616.881 1183.352 ns/op 5.33
Float256Vector.SIN 1024 avgt 10 48733.012 19.4 169500.443 2768.932 ns/op 3.478
Float256Vector.SINH 1024 avgt 10 33625.182 1.423 124512.293 1771.11 ns/op 3.703
Float256Vector.TAN 1024 avgt 10 54313.62 14.978 215172.493 1753.706 ns/op 3.962
Float256Vector.TANH 1024 avgt 10 61708.469 1605.348 63690.609 796.163 ns/op 1.032
Float512Vector.ACOS 1024 avgt 10 93820.934 3011.58 90663.418 2027.372 ns/op 0.966
Float512Vector.ASIN 1024 avgt 10 95866.984 3057.351 97612.203 3787.454 ns/op 1.018
Float512Vector.ATAN 1024 avgt 10 167859.888 4240.703 167247.975 4300.418 ns/op 0.996
Float512Vector.ATAN2 1024 avgt 10 255441.315 685.737 254700.612 4896.306 ns/op 0.997
Float512Vector.CBRT 1024 avgt 10 214410.72 931.01 214285.796 1383.406 ns/op 0.999
Float512Vector.COS 1024 avgt 10 196689.274 1880.854 197309.067 1784.865 ns/op 1.003
Float512Vector.COSH 1024 avgt 10 104335.896 561.089 88993.056 1788.606 ns/op 0.853
Float512Vector.EXP 1024 avgt 10 135852.89 2981.107 135877.338 2846.752 ns/op 1
Float512Vector.EXPM1 1024 avgt 10 152498.16 2995.351 153719.922 2343.672 ns/op 1.008
Float512Vector.HYPOT 1024 avgt 10 188872.565 802.938 188659.105 505.853 ns/op 0.999
Float512Vector.LOG 1024 avgt 10 159618.453 2347.331 159789.006 3077.534 ns/op 1.001
Float512Vector.LOG10 1024 avgt 10 177141.543 2208.144 173862.555 7986.955 ns/op 0.981
Float512Vector.LOG1P 1024 avgt 10 201767.835 2097.682 194773.996 3261.783 ns/op 0.965
Float512Vector.POW 1024 avgt 10 340428.997 898.57 339608.679 2319.682 ns/op 0.998
Float512Vector.SIN 1024 avgt 10 182644.272 2827.997 183512.561 3230.558 ns/op 1.005
Float512Vector.SINH 1024 avgt 10 88864.538 856.677 96766.798 4680.862 ns/op 1.089
Float512Vector.TAN 1024 avgt 10 230591.607 2406.274 235481.617 2062.326 ns/op 1.021
Float512Vector.TANH 1024 avgt 10 41323.35 1108.87 41397.969 105.838 ns/op 1.002
Float64Vector.ACOS 1024 avgt 10 87808.157 135.816 197215.577 1427.244 ns/op 2.246
Float64Vector.ASIN 1024 avgt 10 79126.845 9.019 197556.29 786.402 ns/op 2.497
Float64Vector.ATAN 1024 avgt 10 112334.153 161.759 262670.28 1106.929 ns/op 2.338
Float64Vector.ATAN2 1024 avgt 10 132755.668 148.942 422308.023 1739.683 ns/op 3.181
Float64Vector.CBRT 1024 avgt 10 121393.777 462.316 311727.38 2783.373 ns/op 2.568
Float64Vector.COS 1024 avgt 10 180332.5 204.792 311139.369 2435.05 ns/op 1.725
Float64Vector.COSH 1024 avgt 10 145071.014 281.11 219063.334 1330.185 ns/op 1.51
Float64Vector.EXP 1024 avgt 10 64474.087 18.916 222943.443 2074.818 ns/op 3.458
Float64Vector.EXPM1 1024 avgt 10 128611.56 230.073 242737.624 1997.438 ns/op 1.887
Float64Vector.HYPOT 1024 avgt 10 104683.692 161.234 324578.297 2702.814 ns/op 3.101
Float64Vector.LOG 1024 avgt 10 88124.496 142.168 252264.027 536.035 ns/op 2.863
Float64Vector.LOG10 1024 avgt 10 95184.783 184.6 270674.746 1399.203 ns/op 2.844
Float64Vector.LOG1P 1024 avgt 10 91969.404 1086.102 310777.655 2490.714 ns/op 3.379
Float64Vector.POW 1024 avgt 10 237248.014 1684.478 472070.731 2933.214 ns/op 1.99
Float64Vector.SIN 1024 avgt 10 194778.558 470.935 281942.775 1400.795 ns/op 1.448
Float64Vector.SINH 1024 avgt 10 137944.677 202.705 222200.113 1312.19 ns/op 1.611
Float64Vector.TAN 1024 avgt 10 212713.608 218.316 313379.409 2596.888 ns/op 1.473
Float64Vector.TANH 1024 avgt 10 173926.377 1685.093 174629.554 3082.909 ns/op 1.004
FloatMaxVector.ACOS 1024 avgt 10 21889.905 39.906 90252.786 418.764 ns/op 4.123
FloatMaxVector.ASIN 1024 avgt 10 18793.467 4.566 90741.587 741.291 ns/op 4.828
FloatMaxVector.ATAN 1024 avgt 10 28496.993 6.548 153581.577 1744.674 ns/op 5.389
FloatMaxVector.ATAN2 1024 avgt 10 33658.989 3.092 258396.05 5256.453 ns/op 7.677
FloatMaxVector.CBRT 1024 avgt 10 30350.281 1.956 197139.203 2129.485 ns/op 6.495
FloatMaxVector.COS 1024 avgt 10 45628.863 3.576 187231.562 1821.847 ns/op 4.103
FloatMaxVector.COSH 1024 avgt 10 36925.011 5.202 108522.288 13952.184 ns/op 2.939
FloatMaxVector.EXP 1024 avgt 10 16173.603 1.355 126495.517 621.715 ns/op 7.821
FloatMaxVector.EXPM1 1024 avgt 10 32651.571 16.621 129689.32 2807.684 ns/op 3.972
FloatMaxVector.HYPOT 1024 avgt 10 28246.148 2.652 190196.415 1919.742 ns/op 6.734
FloatMaxVector.LOG 1024 avgt 10 22078.034 5.796 137138.837 3034.756 ns/op 6.212
FloatMaxVector.LOG10 1024 avgt 10 23840.747 6.694 164126.237 2423.198 ns/op 6.884
FloatMaxVector.LOG1P 1024 avgt 10 22993.078 5.389 190345.701 1427.37 ns/op 8.278
FloatMaxVector.POW 1024 avgt 10 58727.04 133.816 316392.473 1713.258 ns/op 5.388
FloatMaxVector.SIN 1024 avgt 10 48729.964 4.439 168993.476 2234.287 ns/op 3.468
FloatMaxVector.SINH 1024 avgt 10 33635.008 2.951 117198.021 2609.295 ns/op 3.484
FloatMaxVector.TAN 1024 avgt 10 54314.082 14.057 213847.082 1390.695 ns/op 3.937
FloatMaxVector.TANH 1024 avgt 10 65545.343 1419.074 65362.648 1729.148 ns/op 0.997
FloatScalar.ACOS 1024 avgt 10 36607.495 4.656 36661.257 20.306 ns/op 1.001
FloatScalar.ASIN 1024 avgt 10 37281.012 28.006 37272.249 46.647 ns/op 1
FloatScalar.ATAN 1024 avgt 10 101949.284 327.101 103939.277 166.274 ns/op 1.02
FloatScalar.ATAN2 1024 avgt 10 165461.209 1727.043 163270.286 593.568 ns/op 0.987
FloatScalar.CBRT 1024 avgt 10 148653.826 166.069 148638.661 171.44 ns/op 1
FloatScalar.COS 1024 avgt 10 129975.842 204.494 129889.093 123.915 ns/op 0.999
FloatScalar.COSH 1024 avgt 10 67462.124 25.755 67761.353 12.415 ns/op 1.004
FloatScalar.EXP 1024 avgt 10 67723.964 8.617 67720.157 15.651 ns/op 1
FloatScalar.EXPM1 1024 avgt 10 85058.759 97.872 84612.5 16.527 ns/op 0.995
FloatScalar.HYPOT 1024 avgt 10 99875.247 713.526 99915.975 607.926 ns/op 1
FloatScalar.LOG 1024 avgt 10 94004.039 20.602 93571.942 124.254 ns/op 0.995
FloatScalar.LOG10 1024 avgt 10 110012.901 232.67 110132.542 476.101 ns/op 1.001
FloatScalar.LOG1P 1024 avgt 10 134646.067 809.2 134554.613 651.85 ns/op 0.999
FloatScalar.POW 1024 avgt 10 246303.685 269.215 246268.844 294.437 ns/op 1
FloatScalar.SIN 1024 avgt 10 115767.708 300.899 116114.497 209.168 ns/op 1.003
FloatScalar.SINH 1024 avgt 10 68118.234 190.657 68973.513 289.182 ns/op 1.013
FloatScalar.TAN 1024 avgt 10 164639.016 323.546 164428.942 175.806 ns/op 0.999
FloatScalar.TANH 1024 avgt 10 17730.106 10.645 17730.258 9.184 ns/op 1

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF (Task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083
$ git checkout pull/21083

Update a local copy of the PR:
$ git checkout pull/21083
$ git pull https://git.openjdk.org/jdk.git pull/21083/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21083

View PR using the GUI difftool:
$ git pr show -t 21083

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21083.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 19, 2024

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@Hamlin-Li
Copy link
Author

Hi @magicus , could you please have a look at the make part? Thanks!

@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF

Reviewed-by: luhenry, ihse, erikj, fyang, rehn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 19 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 19, 2024
@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@Hamlin-Li The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Sep 19, 2024

Copy link
Member

@luhenry luhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see it come to fruition!

Copy link
Member

@erikj79 erikj79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes look ok.

/reviewers 2

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 23, 2024
@openjdk
Copy link

openjdk bot commented Sep 23, 2024

@erikj79
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 23, 2024
@Hamlin-Li
Copy link
Author

Build changes look ok.

/reviewers 2

Thanks for your reviewing!

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have several comments after a cursory look. Please consider.

src/hotspot/cpu/riscv/assembler_riscv.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/riscv.ad Outdated Show resolved Hide resolved
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The RISC-V part LGTM. You still need another reviewer for the rest of the change.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Oct 7, 2024
Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a in-depth review, but looks good.
Thanks for your work, and patience!

@Hamlin-Li
Copy link
Author

Thanks everyone for your reviewing!

I'll push this later after I finally verify the performance with the latest patch.

@Hamlin-Li
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Oct 8, 2024

@Hamlin-Li This pull request has not yet been marked as ready for integration.

@Hamlin-Li
Copy link
Author

@erikj79 @RealFYang Hi, can you help to reapprove the pr? I guess it's because the new "re-review/approve" rule. Thanks!

@Hamlin-Li
Copy link
Author

@luhenry I merged this pr with the master to resolve conflict after your approval, so I think I also needs your reapproval. Thanks

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 8, 2024
make/modules/jdk.incubator.vector/Lib.gmk Outdated Show resolved Hide resolved
@magicus
Copy link
Member

magicus commented Oct 8, 2024

I apologize for bringing in new requests for changes when you were so close to integrating. I understand it must be frustrating. Hopefully, it's a trivial fix and I can re-approve the new version as soon as you commit it.

@Hamlin-Li
Copy link
Author

I've run some jtreg test, and check the generated libsleef.so, it should be good. No further jmh test re-run as seems it's not necessary.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 8, 2024
Copy link
Member

@magicus magicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks good! Thanks!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 8, 2024
@Hamlin-Li
Copy link
Author

Sorry for the inconvenience. Thanks for your quick re-reviews!

/integrate

@openjdk
Copy link

openjdk bot commented Oct 8, 2024

Going to push as commit 580eb62.
Since your change was applied there have been 19 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 8, 2024
@openjdk openjdk bot closed this Oct 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 8, 2024
@openjdk
Copy link

openjdk bot commented Oct 8, 2024

@Hamlin-Li Pushed as commit 580eb62.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@Hamlin-Li Hamlin-Li deleted the sleef-riscv64-integrate-source-f branch October 8, 2024 17:12
@nick-arm
Copy link
Contributor

nick-arm commented Oct 9, 2024

@Hamlin-Li are you still planning to re-submit the AArch64 backend changes that were in #18605 ?

@Hamlin-Li
Copy link
Author

Yes, I could probably send out for review next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants